Skip to content

[LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition#22778

Closed
keshavvinayak01 wants to merge 72 commits intoiree-org:mainfrom
keshavvinayak01:users/keshavvinayak01/onlineattention-useexp2-toggle
Closed

[LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition#22778
keshavvinayak01 wants to merge 72 commits intoiree-org:mainfrom
keshavvinayak01:users/keshavvinayak01/onlineattention-useexp2-toggle

Conversation

@keshavvinayak01
Copy link
Contributor

@keshavvinayak01 keshavvinayak01 commented Nov 27, 2025

Following the discussion from #22441

Depending on the backend, certain computations may benefit from directly using exp instead of exp2, since there might be accuracy losses due to FP-reassociation. It's helpful to add flag incase the user tracks losses to this particular computation and might favour directly using exp.

The use_exp2 flag is mostly unused in dialect conversions and passes, I presume it's used as a KernelOption. The changes here will not modify the default behavior.

…ionOp -> LinalgExt::AttentionOp

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 changed the title [LinalgExt] Added toggle for using useexp2 for onlineAttention Decomposition [LinalgExt] Added toggle for using useExp2 for onlineAttention Decomposition Nov 27, 2025
@keshavvinayak01 keshavvinayak01 marked this pull request as ready for review November 27, 2025 06:24
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Copy link
Collaborator

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

@keshavvinayak01
Copy link
Contributor Author

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

I was going to do that, but then I saw

%result = iree_linalg_ext.attention {decomposition_config = {pv_attrs = {x}, qk_attrs = {y}, use_exp2}, indexing_maps = [#map, #map1, #map2, #map3, #map4], compilation_info = #compilation} ins(%arg0, %arg1, %arg2, %arg3 : tensor<2x10x6x4xf16>, tensor<2x10x4x4xf16>, tensor<2x10x4x4xf16>, f16) outs(%init : tensor<2x10x6x4xf16>) {

Where it's part of the decomposition config itself. So I thought I'd better refine the attribute and continue using it. Making it an optional attribute of the op itself might introduce redundancy since this already exists @MaheshRavishankar ?

@MaheshRavishankar
Copy link
Collaborator

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

I was going to do that, but then I saw

%result = iree_linalg_ext.attention {decomposition_config = {pv_attrs = {x}, qk_attrs = {y}, use_exp2}, indexing_maps = [#map, #map1, #map2, #map3, #map4], compilation_info = #compilation} ins(%arg0, %arg1, %arg2, %arg3 : tensor<2x10x6x4xf16>, tensor<2x10x4x4xf16>, tensor<2x10x4x4xf16>, f16) outs(%init : tensor<2x10x6x4xf16>) {

Where it's part of the decomposition config itself. So I thought I'd better refine the attribute and continue using it. Making it an optional attribute of the op itself might introduce redundancy since this already exists @MaheshRavishankar ?

I dont know the history of that, but we probably need to drop the old usage and just add an optional attribute here. @Groverkss comments?

@Groverkss
Copy link
Contributor

I think it would be simpler to just add an optional attribute to the op itself to use exp2 for decomposition. This doesnt need to be a separate attribute that is not part of the op definition.

I was going to do that, but then I saw

%result = iree_linalg_ext.attention {decomposition_config = {pv_attrs = {x}, qk_attrs = {y}, use_exp2}, indexing_maps = [#map, #map1, #map2, #map3, #map4], compilation_info = #compilation} ins(%arg0, %arg1, %arg2, %arg3 : tensor<2x10x6x4xf16>, tensor<2x10x4x4xf16>, tensor<2x10x4x4xf16>, f16) outs(%init : tensor<2x10x6x4xf16>) {

Where it's part of the decomposition config itself. So I thought I'd better refine the attribute and continue using it. Making it an optional attribute of the op itself might introduce redundancy since this already exists @MaheshRavishankar ?

I dont know the history of that, but we probably need to drop the old usage and just add an optional attribute here. @Groverkss comments?

I think it's okay to have a decomposition config dictionary and add these attributes to it. Attention op needs multiple configuration options so it's useful to have a dictionary.

@Groverkss
Copy link
Contributor

@MaheshRavishankar Can you have a look at this again?

Copy link
Contributor Author

@keshavvinayak01 keshavvinayak01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's re-run CI on this and get it merged? @Groverkss

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
kuhar and others added 14 commits January 20, 2026 05:47
…pass SwizzleHintOps (#23084)

This is the second of a series of PRs that together implement support in
IREE for XOR swizzling through the SwizzleHintOp.

There are four PRs that need to be merged:
1) Allow rank > 1 swizzle hint op operands and add a pass to flatten
swizzle hint allocs.
2) Add patterns which can fold reshapes and `extract_slice` ops into
empty ops through swizzle hint ops.
3) Add swizzle hint attribute to be set in `lowering_config` and
consumed in `GPUPromoteMatmulOperandsPass`.
4) Update `LLVMGPUSelectLoweringStrategy` Pass to set xor swizzles for
MXFP4 GEMMs.

This is PR 2, which does two things:
- duplicates folding patterns for tensor.empty op from upstream
llvm-project in IREE, but with support for swizzle hint ops.
- Adds these patterns to the `GPUApplyTilingPass`.

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
This is the first of a series of PRs that together implement support in
IREE for XOR swizzling through the SwizzleHintOp.

There are four PRs that need to be merged:
1) Allow rank > 1 swizzle hint op operands and add a pass to flatten
swizzle hint allocs.
2) Add patterns which can fold reshapes and `extract_slice` ops into
empty ops through swizzle hint ops.
3) Add swizzle hint attribute to be set in `lowering_config` and
consumed in `GPUPromoteMatmulOperandsPass`.
4) Update `LLVMGPUSelectLoweringStrategy` Pass to set xor swizzles for
MXFP4 GEMMs.

This is PR 1, which does three things:
- Loosens the restriction on SwizzleHintOp inputs needing to be a Shaped
type of rank 1. We do this because things are a lot simpler during
tiling when you can fold arbitrary shapes into the swizzle hint op and
then flatten later.
- Introduces a pass to flatten allocs associated to `SwizzleHintOps`.
- Moves the verification of flatness of swizzle hint ops to the
`ResolveSwizzleHintOps` pass, prior to removal.

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
You can enable it with `-DIREE_REVERSE_ITERATION=On`.

I found 4 failing tests but there might be more non-determinism.
```
iree/compiler/Dialect/Stream/Transforms/test/automatic_reference_counting.mlir
iree/compiler/Dialect/Stream/Transforms/test/automatic_reference_counting_scf.mlir
iree/compiler/Dialect/Util/Transforms/test/hoist_into_globals.mlir
iree/compiler/GlobalOptimization/test/hoist_into_globals.mlir
```

Once fixed, I plan to enable this in CI.
Pass booleans instead of `nullptr`; the former confuses some compilers
because both `bool` and `Value` are constructible with `nullptr`.

Also clean up comments and needlessly complicated code just above.

Fixes: #23164
… modified (#23168)

* Updates torch_ops configuration file to skip running some tests (new
tests added without golden_value and a new failing that was not
skipped).
* Adds a new rule to configure_ci.py to run torch tests whenever
configuration files are modified. This is because otherwise one needs to
remember to add ci-extra to test relevant tests. (onnx and sharktank are
not included here since they are always run on pre-submit)
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Adds a pass to remove iree_codegen.index_hint operations. The pass
unconditionally drops all index_hint ops, and should be used once the
compiler is done using them for optimizations. The ops can get in the
way of later optimizations, so this pass should be used to drop them
once they are no longer needed.

The pass is not added to any pipelines, because we are not generating
index_hint ops anywhere yet, but this pass will be added later once
index_hints start to be used.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Enable tests that were previously excluded but now pass:

ROCM/HIP (tests/e2e/linalg):
- conv2d, narrow_n_matmuls, subbyte_to_fp, fp_to_subbyte,
fp4_f32_conversion, index

VMVX (tests/e2e/linalg):
- argmax, index

VMVX (tests/e2e/linalg_ext_ops):
- attention

Vulkan (tests/e2e/linalg):
- argmax, index

Vulkan (tests/e2e/linalg_ext_ops):
- map_gather, map_scatter, top-k

Vulkan (tests/e2e/stablehlo_ops):
- reverse

Below is the additional testing time on my machine (using gfx1100):

```
● Test execution times for newly enabled tests:
  ┌──────────┬───────┬────────────┐
  │ Backend  │ Tests │ Total Time │
  ├──────────┼───────┼────────────┤
  │ ROCM/HIP │ 6     │ 3.06 sec   │
  ├──────────┼───────┼────────────┤
  │ VMVX     │ 3     │ 0.28 sec   │
  ├──────────┼───────┼────────────┤
  │ Vulkan   │ 6     │ 0.58 sec   │
  ├──────────┼───────┼────────────┤
  │ Total    │ 15    │ ~3.9 sec   │
  └──────────┴───────┴────────────┘
  Individual test breakdown:

  ROCM/HIP:
  - conv2d: 0.28s
  - fp4_f32_conversion: 0.39s
  - fp_to_subbyte: 0.43s
  - index: 0.27s
  - narrow_n_matmuls: 0.97s
  - subbyte_to_fp: 0.72s

  VMVX:
  - argmax: 0.04s
  - index: 0.04s
  - attention: 0.20s

  Vulkan:
  - argmax: 0.05s
  - index: 0.05s
  - map_gather: 0.13s
  - map_scatter: 0.12s
  - top-k: 0.19s
  - reverse: 0.05s

  All tests are fast (under 1 second each). The slowest is narrow_n_matmuls on ROCM at ~1 second.
```

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Injects iree_codegen.index_hint ops on offsets in the
populateOperandOffsetsSizesStrides functions for MMAAttrs. We inject the
hints here, because the semantic information about the offsets is
readily available, and can easily carry down to the later optimization
pass that converts loads into transpose loads using these hints. These
hints are intended for load to transpose load optimizations, but they
are set unconditionally regardless of transpositions for simplicity. The
later optimization pass is responsible for determining when the loads
are transposed, since it is more explicit at that point.

The hint ops will be dropped right after LLVMGPULowerExecutableTarget,
since at that point the index_hint ops should already have been used.
Currently, the pass that consumes these hint ops is not enabled, so the
hint ops will be doing nothing until the pass is added.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
I don't want to add too many CI workflows, so adding together with
ubsan.
This is hard to test for because only the (dynamic) host feature list is
unordered, unlike features for a specific target, and we can't assume a
specific host in tests.
* Use `llvm::IsaPred<T>` instead of lambdas where possible
* `!any_of` --> `none_of`
@keshavvinayak01 keshavvinayak01 force-pushed the users/keshavvinayak01/onlineattention-useexp2-toggle branch from 8b4a278 to b47101f Compare January 20, 2026 05:53
@keshavvinayak01 keshavvinayak01 deleted the users/keshavvinayak01/onlineattention-useexp2-toggle branch January 20, 2026 06:11
@MaheshRavishankar
Copy link
Collaborator

What happened here. Why did you close this?

@keshavvinayak01
Copy link
Contributor Author

I was trying to rebase and push to trigger CI, but the git history got messed up. So I re-opened it #23211

@hanhanW
Copy link
Contributor

hanhanW commented Jan 20, 2026

Next time, you can get to #23211 branch and run git checkout -B this-branch. It will overwrite the current branch history; you can re-open this PR.

Merging PR like that is introducing overheads for future code tracking, IMO. You force people clicking into many links to figure out old review comments and the reason of making changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.